On the Fading Paper Achievable Region of the 
Fading MIMO Broadcast Channel 



Amir Bennatan, Member, IEEE and David Burshtein, Senior Member, IEEE 



Abstract 

We consider transmission over the ergodic fading multi-antenna broadcast (MIMO-BC) channel with 
partial channel state information at the transmitter and full information at the receiver. Over the equivalent 
£Nj ' non-fading channel, capacity has recently been shown to be achievable using transmission schemes that were 

designed for the "dirty paper" channel. We focus on a similar "fading paper" model. The evaluation of the 
fading paper capacity is difficult to obtain. We confine ourselves to the linear-assignment capacity, which we 
define, and use convex analysis methods to prove that its maximizing distribution is Gaussian. We compare 
our fading-paper transmission to an application of dirty paper coding that ignores the partial state information 
and assumes the channel is fixed at the average fade. We show that a gain is easily achieved by appropriately 
exploiting the information. We also consider a cooperative upper bound on the sum-rate capacity as suggested 
by Sato. We present a numeric example that indicates that our scheme is capable of realizing much of this 
. upper bound. 

> 

, Index Terms 

o i 

Broadcast channel, Dirty paper, MIMO, Sato bound 



O 

o 



- 1—1 

X 



I. Introduction 

The multiple-antenna Gaussian broadcast channel has recently been the subject of intense research. This 
surge of interest was spurred by the seminal work of Caire and Shamai [6], who suggested an achievable region 



?— i ' for this channel based on dirty -paper coding. Recently, this region was shown by Weingarten et al. [30] to 
exhaust the capacity region of the channel. 

However, the channel model examined in [6] assumes that the fading coefficients of the MIMO channel are 
fixed and known to both the transmitter and the receiver. In several realistic settings, the coefficients fluctuate 
over time. They are estimated at the receiver and are fed back to the transmitter. At best, we can assume that 
the transmitter has a rough, outdated estimate of the coefficients. 
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Telatar et al. [27], in his work on the single-user MIMO channel, focused on a setting where the transmitter 
has zero knowledge of the fading coefficients. In a broadcast setting, this problem is typically uninteresting 
because its solution is often trivial. In Appendix H we will see such a setting where time-sharing (TDMA) 
is the best that can be achieved. However, in a realistic setting, the transmitter has some knowledge of the 
channel to each of the users. This knowledge can be modelled as channel distribution information 3- 

We assume an ergodic channel, in the sense that a new channel realization is obtained at each time instance. 
However, the channel distribution, which is known to the transmitter, remains fixed for the duration of the 
transmission. 

The analysis of ergodic broadcast channels was initiated by Cover [10]. The capacity of such channels 
is known only in special cases, where the signals to the users can be ordered according to their "strength". 
A large class of such channels, known as "more capable" channels, was considered by El Gamal [12], who 
also evaluated the capacity in this case. This class contains "degraded" and "less noisy" channels as special 
cases [12]. 

Tuninetti and Shamai [28] considered the fading scalar broadcast channel, which is a special case of the 
fading MIMO-BC channel obtained by setting the number of antennas at the transmitter and receivers to 
one. They showed that this channel is not "more capable" in general. They nonetheless evaluated the "more 
capable" region as defined by [12]. This region is still achievable despite the channel being not "more capable", 
although it is only an inner bound and does not exhaust the entire capacity region. 

Jafar et al. [16] considered the fading MISO-BC, characterized by receivers that have only one antenna 
each. They considered the case when the distribution of the fading coefficients is isotropic. In this case, they 
proved that the capacity region collapses to that of the above fading scalar channel. Lapidoth [21] examined 
a similar two-user fading MISO-BC channel, and demonstrated that at the limit of high SNR, a significant 
loss is incurred as a result of the unavailability of precise channel state information at the transmitter. Sharif 
and Hassibi [25] proposed a beamforming transmission approach for the case when the knowledge available 
to the transmitter is the collection of SINR values available to each of the receivers. 

The fading MIMO-BC channel, being not "more capable" in general, is difficult to analyze. In this paper 
we focus on an achievable region which is modelled on the dirty paper region of Caire and Shamai [6]. Our 
development uses a fading-paper approach which is a generalization of the dirty -paper approach of [6]. A 
fading paper solution was previously considered for a wideband fading channel in [3], although they assumed 
an interference which is known only causally, unlike the dirty paper problem of Costa. The proof of [30] does 
not apply to the fading MIMO-BC capacity region, so that the fading paper approach is not guaranteed to 
be optimal. Furthermore, the capacity of the fading-paper channel is in general not known. We focus on its 
linear-assignment capacity, which we define. We use convex-optimization methods to prove that a Gaussian 

'A different model was proposed by Jindal [18] and Caire [5], who incorporated the feedback from the receiver into the channel 
model. 
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distribution achieves this capacity. 

We compare the rate region achieved by this approach to the region that is achievable by a dirty-paper 
scheme that ignores the available channel state information and assumes that the channel is fixed at its average. 
We show that a substantial benefit is easily achieved by appropriately exploiting the available information. 

This paper is organized as follows. We begin with some background in Sec. ITT] We define our notation and 
the channel model, discuss the dirty-paper channel and its application to transmission over the non-fading 
MIMO-BC channel. In Sec. J]]] we discuss the fading-paper generalization of the dirty-paper channel, define 
the linear-assignment capacity and discuss its maximizing distribution. In Sec. [TV] we define a region that is 
achievable using linear-assignment fading-paper transmission methods. We also compare this region to that 
of dirty-paper based transmission that assumes the channel is fixed at its average. In Sec. [V] we present ideas 
for further research and conclude the paper. 

II. Background 

A. Notation 

Eh denotes the expectation over the random variable H. Matrices are denoted by upper-case letters, with 
bold indicating realizations of random variables (e.g. H is the realization of H). Vector values are denoted in 
boldface and scalar values are denoted in normal typeface. With both, lower-case letters denote the realizations 
of random variables (y is a realization of Y and y is a realization of Y). 

The inner product of two equal-dimension matrices A,B G ~R MxN is defined by, 

At N 

<A,B> = E A m,nB m ,n = tr[A ■ B T ] 

m=l n=l 

R + denotes the non-negative real numbers and M++ the positive real numbers. 

B. System Model 

We consider a broadcast channel with L users. The transmitter has M transmit antennas and user I has Ni 
antennas. For simplicity we assume that all signals are real-valued. 

The channel output Y^ observed by receiver u at a discrete time instance t is given by, 

Y® = H® ■ X, + Z? 

is a Ni x 1 column vector. is a random JVj x M matrix denoting the channel transition matrix. We 
assume that instances of are independent over time (for different values of t) and between users (i.e., 
for different values of /). As noted in Sec. |TJ we assume that this matrix is known to the receiver, and in our 
subsequent analysis, we consider it as part of the channel output. X t is an M x 1 column vector denoting the 
transmitted signal. denotes Gaussian noise, distributed as a iV; -dimensional zero-mean Gaussian random 

Q 

variable with identity covariance matrix I q 

If the noise's covariance matrix is not I, we can multiply YJ by the inverse of the square root of of the matrix and obtain an 
equivalent channel that does agree with this model. 
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In the sequel, for simplicity, we will drop the time index t. We assume that the transmitter is subject to an 
average power constraint P. That is, we require, 

E tr(XX T ) < P 

The only assumption we make on the distribution of is that is has finite energy, i.e. E < H^',H^ 1 ' > is 
finite. 

C. Dirty Paper Channels 

The dirty -paper channel was first considered by Costa [8]. It is defined by 

Y = X + S + Z (1) 

The channel input X is subject to a power constraint P, i.e. The noise Z is distributed as a zero-mean 
Gaussian variable with variance a\ > 0. S is interference, known to the transmitter but not to the receiver. 

Costa obtained the remarkable result that the interference, despite being known only to the encoder, incurs no 
loss of capacity in comparison with the standard interference-free channel. Costa assumed that S is Gaussian 
i.i.d distributed. This result was extended in [7] and [13] to arbitrarily distributed interference. Costa's result 
was further extended to the Gaussian MIMO channel by Yu et al. [31]. With this channel model, vector 
Y, S, X and Z replace the above scalar equivalents, Z being a zero-mean Gaussian random vector with 
nonsingular covariance matrix Y<z \\. 

In Sec. lII-Dl we will consider dirty -paper in the context of transmission over nonfading MIMO-BC channels. 
In that context, it will be useful to consider the following variation of ([[)) (using vector substitutes for Y, S, 
X and Z), 

Y = H(X + S) + Z (2) 

where S and X are M dimensional, Y and Z are N dimensional, and H is an N x M fixed channel matrix^. 
We assume this formulation of the dirty-paper problem throughout the rest of this paper. Once again, the 
capacity coincides with that of the corresponding no-interference channel, whose output Y is given by, 

Y = HX + Z (3) 

The dirty-paper channel is an instance of the more general class of side-information channels, first considered 
by Shannon [24]. Such channels are characterized by an input X, output Y and state-dependent transition 
probabilities Pr[y|x, s] where the channel state S is i.i.d., known to the transmitter and unknown to the 
receiver. In the context of £[)), the interference S constitutes the channel state. 

3 Note that unlike the fading MIMO-BC model of Sec. IIII-AI we find it more convenient to allow Ez 7^ I in this context of the 

vector dirty-paper channel. 

4 The matrix H is denoted in bold since in the next section it will be a realization of a random variable. 
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Shannon [24] considered the case of the state sequence being known only causally. Kusnetsov and 
Tsybakov [20] were the first to consider the case of state sequence known non-causally, and Gel' f and and 
Pinsker [14] obtained the capacity formula for this case. The capacity of this channel is given by 

C= sup {I(U;Y)-I(U;S)} (4) 

Pr[«M,f(.) 

where U is an auxiliary random variable with conditional distribution Pi[u \ s] and f( ) is a deterministic 
function, such that the transmitted signal X is given by X = f(S, U). 

In [31], the capacity of the dirty-paper channel was obtained from (@]) using an auxiliary random variable U 
given by U = F-S+X, where F is a fixed matrix^] and X is a zero-mean Gaussian-distributed random-variable, 
independent of S. The use of X has a dual role. First, it is a component in the definition of the transition 
probabilities Pr[u | s]. Second, given U and S, the transmitted signal satisfies f (U, S) = U — F • S = X. The 
covariance matrix Ex of X is determined as in the no-interfence channel (see e.g. [9]). An expression for F 
was developed by Yu and Cioffi [32]. In this paper, we use the following, equivalent expression: 

F = SxH T (HEjfH T + Xz) -1 H (5) 

A proof that this choice of F indeed achieves the no-inteference capacity is provided in Appendix ITT] This 
proof is different from the proof of [32], and is provided primarily for completeness. 

Costa [8] and Yu [31] obtained their results using random codes and maximum-likelihood decoding. 
Zamir et al. [33] and Bennatan et al. [1] have presented practical methods for transmitting at rates that 
approach the above computed capacities. Their approaches were developed for the scalar dirty-paper channel, 
but can easily be adapted to the MIMO setting [l][Sec. VII]. 

D. The Dirty-Paper Achievable Region 

In their construction for the non-fading MIMO broadcast channel, Caire and Shamai [6] used dirty-paper 
coding to transmit in the following way. The transmitted signal X is constructed as the vector sum of L 
signals Xi,...,X^, where X; contains the transmitted signal to user I. Each user is also allotted a virtual 
power constraint Pi such that J2b=i Pi = P- Using dirty-paper coding, the transmitter can generate the signal 
X; such that the interference generated by Xi, X^_i is effectively pre-subtracted. More precisely, encoding 
proceeds in the following way, 

1) The transmitter begins by selecting a codeword ci for user 1. 

2) It then proceeds to determine the signal for user 2. It constructs the signal X2 for user 2 using a dirty- 
paper transmission scheme, making use of its full non-causal knowledge of ci and treating it as known 
interference (in lieu of S in (Q])). 

3) The signals X3, ...,Xx are constructed in a similar manner. When constructing the signal to user I, the 
signal = Xi + X2 + ... + Xj_i is treated as non-causally known interference. 

5 We denote the matrix F in bold throughout the paper in order to distinguish it from the functional F(q,Q). 
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The operation of the receivers mirrors the above transmission scheme. Receiver I applies dirty-paper decoding, 
effectively cancelling the interference generated by Xi + X2 + ... + Xj_i but treating X; + i + ... + Xl as part 
of the unknown noise (alongside Z). 

The above transmission strategy defines an achievable rate region for the Gaussian MIMO broadcast channel. 
This region is a function of the virtual power constraints Pi imposed on the users. Furthermore, it is a function 
of the covariance matrices Ey by which the various codebooks for the signals X/ are randomly generated. 
It is also a function of the ordering of the users. The convex-hull of the union of all regions obtained in this 
way constitutes the dirty -paper achievable region Cdpc(P)- m [30], this region was shown to exhaust the 
MIMO broadcast capacity region. 

However, the application of dirty-paper transmission methods in the above algorithm is heavily reliant on 
the availability of precise knowledge of the fixed channel matrices {H'''}[ =1 at the transmitter. Without these, 
the pre-subtraction of the signals {Xj}j<j, when constructing X;, is not possible. 

III. The Fading-Paper Problem 

A. Channel Model 

The fading-paper channel is an adaptation of the dirty-paper model (as expressed in ©) of Sec. III-CI 
designed to account for the absence of channel state information at the receiver. The channel is defined by, 

Y = H(X + S) + Z (6) 

Unlike the case in ©, the channel matrix is random and is know to the receiver but not to the transmitter. 
The pair (Y,H) constitutes the channel output, where Y is the channel observation and H is the channel 
matrix. 

The channel transition probabilities are also a function of the distribution of the interference S and of 
the channel matrix H. In this paper, we assume S to be a zero-mean Gaussian distributed random variable 
with covariance £5. As noted in Sec. III-BI we make no assumptions on the distribution of H, beyond it 
having finite energy. Following the discussion of side-information channels in Sec. III-CI the capacity of the 
fading-paper channel is given by, 

C= sup {J(U;Y,F)-J(U;S)} (7) 

Pr[u I s],f(-) 

where U is an auxiliary random variable whose joint distribution with S can be obtained via Pr[u | s]. f(-) 
is a vector-valued deterministic function, such that the transmitted signal X is given by X = f (U, S). 

Note that for any particular choice of Pr[u | s] and f(-), the contents of the braces are an achievable 
transmission rate over the channel, 

^achievable = /(U; Y,ff) - J(U; S) (8) 
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B. The Linear-Assignment Capacity 

In this paper, we focus on a subset of achievable rates for the fading-paper channels, modelled on the 
dirty -paper capacity-achieving assignment for U and f(-). That is, we focus on an auxiliary random variable 
U given by 



where F is some arbitrary real-valued M x M matrix, and X is an arbitrary zero-mean random-variable, 
which may depend on S. We define f (u, s) = u — Fs. We refer to such an assignment as a linear assignment. 
We call the maximum in CO), when restricted to such assignments, the linear assignment capacity. 

Linear assignments may equivalently be defined as follows. A linear assignment is characterized by an 
arbitrary zero-mean M -dimensional random variable U (recall that M is the dimension of X and S), which 
may be dependent on S, and an arbitrary real-valued M x M matrix F. In the context of (O, U corresponds 
to the auxiliary variable U and f(-, •) is defined by f(u, s) = u — Fs. A set U, F and f(-, •) given by the 
first definition straightforwardly satisfies the conditions of the second definition. To see that the reverse holds, 
observe that we have allowed X to be completely arbitrary. In particular, we have in no way required X to be 
Gaussian or independent of S. Thus, given a pair U and F corresponding to the second definition, we may 
define X = U — F • S and the resulting set U, X, F and f (-, •) coincides with the first definition. 

The optimality of linear assignments for the dirty-paper problem of Sec. III-CI is obtained from the fact 
that their maximum achievable rate coincides with the capacity of the corresponding no-interference channel. 
This is clearly the best we can hope for, and thus such assignments achieve capacity. With fading-paper, the 
achievable rate with linear assignments is in general strictly below the no-interference upper-bound. Thus, it 
is not known whether it is optimal. 

In our above definition of linear assignments, we left the distribution of X undefined. Specifically (as noted 
above), we did not insist on X to be Gaussian, and did not insist on it being independent of S, as we did 
in Sec. III-Cl when we discussed the capacity-achieving assignment for the dirty-paper channel. However, the 
following theorem establishes the optimality of a Gaussian-distributed X. In Sec. [TV] we will show that we 
may also assume X to be independent of S. 

In the following theorem, we assume the following regularity conditions: 

1) We assume that the expectations d29l ), (l30l) . (I3T1) and (l32l (defined below), exist and are finite. Note that 
this condition is satisfied, for example, when the distribution of H is discrete and takes a finite set of 
values. 

2) We assume that the covariance matrix of the vector (S,U), 



U = F-S + X 



(9) 




(10) 
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is nonsingular (i.e., it is a positive definite matrix). Note that this also implies that E^ is nonsingular, 
being a principal submatrix of Cov(U, S). Since, 





and since the matrix on the right hand side of the last equation is nonsingular, a sufficient condition 
that CGS is nonsingular is that det(Es) > 0, det(Ex) > and det(Ex - E^E^E^x) > (Ex 
and Y<s,x are the covariance of X and the cross-covariance of S and X, respectively). 
3) We assume an arbitrary density q(u | s) with respect to the Lebesgue measure. 

Definition 1: Given a linear assignment, the collection of matrices E5, E^X; Ex and F is called its setting. 

Theorem 1: Assume the above-mentioned regularity conditions. For any fixed setting, the linear-assignment 
capacity (as defined above) is achieved by a choice of X that is jointly Gaussian with S. 

Proof: We begin with a brief outline of the proof. We consider ([8]) as a function of the density g(u|s) and 
of Q(u I y, H), defined below. We then seek to show that qc(-) and Qg{')> corresponding to a joint-Gaussian 
choice of X and S, maximize (fU). To do so, we pose the problem as a concave constrained maximization 
problem, and show that qQ and Qq admit Lagrange multipliers. 

We now rewrite ([U) as F(q,Q), given bjj], 

FM) = L- L~ L» L*„ h{e)h -« 1 h I .. x - f(u. .M- 1 •) ■ 

• log ^ U l y,H ) dU dy du ds (11) 
q(u I s) 

Recall that M and N are the dimensions of S and Y, respectively. We also denote by TZr the support region 
of the random variable H. Q(u | y, H) is the conditional distribution of the above-defined U given the channel 
output Y and the signal fade H. fs(s) is the density of S and fy,H | s,x(y> H | s, x) is the conditional density 
of Y and H given the transmitted x and interference s. 

Since we make no assumptions on the distribution of H, the existence of this density is not guaranteed. 
However, the generalization to the case when the density does not exist is straightforward. In the sequel, we 
drop the subscripts and denote the densities by /(s) and /(y, H | s, x). Note that /(s) should not be confused 
with the previously defined f(u, s). 

We defined Q(u | y,H) in (TTTT > to be the conditional density of the above-defined U given the channel 
output Y and the signal fade H. Actually, in the sequel we find it convenient to relax this requirement and 
consider F(q,Q) for arbitrary probability densities Q(u | y, H). However, the pair q and Q that maximizes 
F(q,Q) will satisfy the requirement. In this we follow the example of [17]. 

For given Eg, Ex and S^x, let <?g( u I s ) and Qg( u I y, H) denote the conditional densities corresponding 
to the choice of X that is jointly-Gaussian with S. Our objective is to show that qc and Qq maximize F(q, Q). 

6 This definition is an adaptation of a similar definition by Heegard and El Gamal [17] 
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F(q, Q) as defined in (TTTb is jointly-concave in its arguments. Thus we may wish to apply methods from the 
theory of convex optimization to maximize it. Formally, we seek to solve the following constrained problem 

(12) 



max F(q, Q) subject to 

q,Q 



sell 



/(s)g(u | s) f(u,s) • f(u,s) 



T 



ueii 



sell 



w /(s)g(u|s) s-f(u,s) 



T 



du ds 
du ds 



ueii 



q(u | s) du 



ueii 



Q(u|y,H) du 



1 Vs G I 
1 Vy G 



,VH G K H 



(13) 
(14) 
(15) 
(16) 



Recall that Theorem [JJ assumes a fixed setting. Thus, the matrices £5, S^x, £x and F are assumed to be 
given and fixed. The maximization is performed over the set of distributions corresponding to these matrices, 
and our objective is to show that a Gaussian distribution is optimal. Optimization of the matrices themselves 
is beyond the scope of this proof (such optimization will be discussed in Sec. IIV-Bb . 

(fT3l and (fl4l are derived from the conditions £x and £s,x on the transmitted signal X. That is, recalling 
that X = f (U, S), they are equivalent to 



E 



X-X J 



Ex, e 



s -X J 



To further simplify our analysis, we allow the arguments q and Q of F(q, Q) to be arbitrary nonnegative 
measurable functions. Constraints (031 ) and (fl6l ). compensate for this and ensure that the final result is a valid 
conditional distribution. Functions q and Q that satisfy constraints (fT3l . (fT4l . ([TBI ) and ([TBI are called feasible. 

A straightforward approach to our optimization problem would appear to be to apply the Karush-Kuhn- 
Tucker (KKT) conditions to find the global maximum. In reality, this is slightly more involved because 
equations (fT5l ) and (fl6l ) involve an infinite number of constraints. Furthermore, the arguments of F(q, Q) are 
functions rather than vectors. In [26], the necessity of the KKT conditions was proven under certain conditions. 
In this paper, we only require their sufficiency for convex functionals, which is easier to prove. Our proof is 
tailored to the setting of our particular problem. We begin by defining Lagrange multipliers. 

Definition 2: Let q, Q be two positive-valued^ feasible functions. Lagrange multipliers for q and Q are 
matrices T, T G R MxM , and real-valued functions a(s) : M A/ -> M and /3(y, H) : R N x K H -> R such that, 

Q(u|y,H) 



J y eR N L 



neiZt 



/(s)/(y,H|s,x = f(u,s)) 



log 



1 



u s 



diidy + 



f(s) < T, f (u, s) • f (u, s) T > +/(s) < T, s • f (u, s) T > +q(s) 







Vs G 



Vu G 



Dili 



(17) 



r /(s)/(y, H I s, x = f (u, s)) n f (U|S L ds + /?(y, H) = 

Vu G M M ,Vy G R N ,VH. G K H (18) 



7 The condition that q and Q be positive-valued is required for the expressions that follow, which involve division by Q(u | y, H) 
and q(u | s), to be valid. 
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We say that two functions q and Q admit Lagrange multipliers if Lagrange multipliers that satisfy Definition |2] 
exist for them. 

To obtain some motivation for ( fT71) and ( fT8l ), consider the formal Lagrangian, defined as 

C(q, Q; T, T, a, /?) ± F(q, Q)+ < T, E(q) > + < T, C(q) >+ f a(s) • / ? (u | s) du ds + 

+ L» L., «* H) • U- o(u 1 H) d » ,mi * (m 

where E(g) and C(q) are matrix-valued functionals given by the left-hand-side of (fl3l) and (fl4l) . Formally 
differentiating C(q, Q; T, F, a, /?) with respect to q(u | s) (for given u and s) and comparing with zero, would 
render (fT71) . Similarly, differentiating with respect to Q(u\ y, H) (for given u,y and H), and comparing with 
zero, would render (TT8T ). However, the integrals in ( fl9l ) are defined over unbounded sets, making their rigorous 
analysis difficult. We therefore prefer to avoid the use of ( fT9l , and rely on Definition [2] as the definition for 
Lagrange multipliers. 

We are now ready for the following lemma, 

Lemma 1: Let q* and Q* be a pair of positive-valued feasible functions for the problem (fT2T) . Assume once 
again that Q* is the marginal distribution of U given y and H, when the distribution of U is determined 
from the densities /(s) and q*(u \ s). If q* and Q* admit Lagrange multipliers, then they are a solution (i.e., 
achieve the global maximum) of (fl2l) . 

A proof of Lemma Q] is provided in Appendix |III] The proof is basically an application of well-known concepts 
from convex optimization theory. The proof of Theorem Q] now focuses on showing that the above defined qc 
and Qg admit Lagrange multipliers. We begin by providing the expressions for these two densities. 

Recall once more that the setting of the problem (see Definition [T]) is fixed. That is, we assume that Eg, 
Es x, Ex and F are given and fixed. Also recall that U is related to S and X through U = FS + X and 
that qc and Qg correspond to a choice of X that is jointly-Gaussian with S. 

To obtain qc, we observe that since U and S are jointly-Gaussian, the conditional distribution of U given 
S is also Gaussian, with mean | g(s) and covariance Y^u | s given by (see e.g. [19]), 

m u]s (s) = EU + Cov(U,S)-E 5 1 -(s-ES) 

Z u]s = Cov(U) -Cov(U,S) -E^ -Cov(S,U) 

Note that by our second regularity assumption (above), that the covariance of (U, S) is nonsingular (positive 
definite), it follows that E[/ 1 s is also nonsingulac. 
Using U = FS + X and EU = ES = 0, we obtain, 

m u]s (s) = Js, where J = (FE S + E^ X )E 5 1 (20) 
Z u]s = (FS5F T + FS SiX + E^ x F T + S x )-(FE s + S^)E5 1 (FS 5 + E^) T (21) 
8 To see this, assume by contradiction that vEy | gv T = for some nonzero row vector v. Thus, with probability 1 we would 



have v • U = v • i s(S), and therefore, using (|20l(, [v, -vJ] ■ [IP , S J Y = 0. This would imply that Cov(U, S) is singular. 
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Observe that J and Y,jj 1 s are fixed matrix functions of the matrices £x> Eg, Eg ; x and F that constitute the 
problem setting. Hence, 



%?(u | s) 



1 



exp(--(u- Js) J E^ s (u- Js)) s G R , u G 



(22) 



/det(27rE c/ 1 s ) 

To obtain Qg, we observe that for fixed H, the distribution of U given Y is also Gaussian. 
m ulYtH {y,U) = E[U | # = H] + Cov(U,Y | F = H) • Cov(F | # = H)" 1 • (y - E[y | # = H]) 
E(y | y,h( h ) = Cov ( u I H = H ) - Cov(U, Y | H = H) • Cov(y | i? = H)" 1 • Cov(Y, U | H = H) 

We now claim that £j/ 1 y t g(H.) is also nonsingular. This will be shown by proving that Sj/y|^(H) is positive 
definite, i.e. 



(a T ,0 r )H UiY]H (B) (° j =E|(a T U + /3 r Y) 2 | F = h| > V(a,/3)^0 



(23) 



Now, by © and ©, 



Y = H(-F + I)S + #U + Z 



By our second regularity assumption, the covariance of (U, S) is nonsingular. It follows that Y*u is positive 
definite. We thus conclude that d23l holds for (3 = 0. If, on the other hand (3 ^ 0, then 



e|(« t U + /3 t y) 2 |# = hJ = E|(a T U + /3 T (^(-F + I)S + ^U)) 2 | F = hJ+E j (/3 T z) 2 J >0 

since Z is independent of X, S and H, and its covariance, E^, is nonsingular. This proves our claim. 
Using similar arguments as in the above development of qc, we obtain 



m u \ Y)H (y,B)=K(H)y 



where, 



K(U)= (FS 5 + F£g,x + Si x + £x)H J H(£g + Ex + + E? X )H J + S z 



and, 



E;7 1 y,if(H) = (FS5F 7 + FIl^x + E^ jX F T + Ex) 



(F£g + FSs t x + E 5) x + Ex)H H(£g + Ex + Sg,x + Eg ; x)H + Ez 

-, x 

FS5 + FE^x + S S x + Ex)H 



-1 



(24) 



Observe that K(H) and S[/|y^(H) are fixed matrix functions of the matrices that constitute the problem 
setting, and of H. Hence, 



Q G (u|y,H) 



1 



1 



eX p(-l(u - K(H) y y Y. v 1 y,^(H)- 1 (u - K(H)y)) 



det(2vrS [/ |y H (H)) 



y G R N ,H G ftn.u G M M 



(25) 
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We observe that (/g( u I s ) is positive-valued for all u G M and s G R . Similarly, Qg( u I y 5 H) is 
positive- valued for all u G M M , y G 1^ and H G TZh, where TZh is the support region of H. Therefore, 
they satisfy this condition of Lemma Q] The conditions of Lemma Q] also require that Qg be the marginal 
distribution of U given y and H, when the distribution of U is determined from the densities /(s) and 
qc (u | s). This is satisfied by definition. 

We proceed by showing that the two functions qc and Qg admit Lagrange multipliers. Finding a Lagrange 
multiplier /3(y, H) to satisfy (TT~8T > is easy. As in the discussion following d47"T ), we have 



Thus, defining /3(y,H) = — gc(y, H), ( fT8l ) is satisfied. 

We now turn our attention to the other Lagrange multipliers and to (fTTT ). Let u and s be fixed and let 
x = f(u, s). Simple manipulations of (fT71 ) lead to, 






+ < T, x • x T > + < T, s • x T > + 



a(s) 



= 



We continue, 




:)logQ G (u|y,H)(fH(iy-logg G (u|s)+ < T, x • x T > + < T, s • x T > 



(26) 



We begin by examining the first element in the above sum. This element is equal to, 



Ey,H [log Q G (u I Y, H) | X = x, S = s] = 
= ~-j^Y,H log det(2vrS [7 1 y ,h(H)) | x, s 



~Ey >H [(u - K(H)Y) T Vu 1 Y , H (H)- l (u - K(H)Y) | x, s 
= -\v H [logdet(2vrS C7 | y>H (F))" 
-^Eh {Ey [(u - K(H)Y) T T, U | Y)H (H)-\u - K(H)Y) | x, s, h] } 



(27) 



We now focus on the contents of the braces. We use u = Fs + x, Y = H(x + s) + Z to obtain, 



Ey [(u - K(U)YY Z v i y ^(H)" 1 (u - K(H)Y) | x, s, H 
x T [(/ - K(U)U) T ^u | ^(H)- 1 ^ - tf(H)H)] x + 
+s T [(F - J ^(H)H) T S C/ | y , H (H)- 1 (F - K (H)H)] s + 
2s T [(F - K(H)H) T E C/ 1 y^OH)-^/ - tf(H)H)] x + 
+tr k(H) T S c/ | ^^(H)- 1 ^^) + E z 
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Thus, we can rewrite ( f27T ) as, 

x T ix + s T Bs + s T Cx + D =< A, x • x T > + < B, s • s T > + < C, s • x T > +D 



(28) 



where, 



A 
B 
C 
D 



--E H [{I - K(H)H) Eu , y ,h(H)~ (I - K(H)H) 



1 

-Eh 
1 



Eh 



(29) 

E H [(P - K(H)H) T ^u , y^(^ (F - )fT)J (30) 
"(F - K{H)H) T V V | Y:H (H)-\I ~ K{H)H)\ (31) 
"log det(2vrS l/ , YjH (H))] - l -E H {tr [^(i/f^ , y ,h(H)~ 1 K(H) + S z ] } (32) 

By the conditions of Theorem [T] the above expectations exist and are finite. Turning to the second element 
of the sum in d26l ) we obtain, using (1221 

- log q G (u | s) = - log det(2 7 rE c/ , 5 ) + -(u - Js) T S f }] s (u - Js) (33) 

Applying a similar development to that of d2"7T ), we can rewrite (|33l as, 



< 1, x • x T > + < B, s • s T > + < C, s • x T > +D 



(34) 



where 



A 
B 
C 
D 



-E 



-l 
u\s 



h F -J) T Z-)JF-J) 



ilogdet(2vrS l/ | 5 ) 



Using (1281) and (1341) . we can rewrite (1261) as, 

<^ + i + T,x-x T > + <£ + S,s-s T > + <C + C + r,s-x T >+Z) + i) + 



o(s) 

m 



Finally, we may select our Lagrange multipliers for (fTTT ) as follows, completing the proof of Theorem Q] 



T = -(A + A), T = -(C + C), a(s) = /(s) 



1 -D-D- < B + B,s-s T > 



□ 



Note that with linear-assignment, when X and S are jointly-Gaussian, the achievable rate /(U; Y,H) — 
I(XJ) S) is a function of the setting (as defined in Definition Q]). The expression for the achievable rate can 
be computed as follows, 



1 



1 



J(U; Y, H) - 7(U; S) = h(U \ S) - h(V \ Y,H) = - log det E^ , s - -E H 



logdeiZ ulYtH (H) (35) 



The last equation is obtained from the following discussion. For fixed s, the marginal distribution of U given 
S = s is zero-mean Gaussian distributed with variance Ey 1 5 (which is given by (|2T1 ) and is independent 
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of s). For fixed y and H, the marginal distribution of U given Y = y and H = H is zero-mean Gaussian 
distributed with variance Ej/ 1 y #(H) (which is given by (|24l) and is independent of y but dependent on H). 

Note that the achievability proof of Gel'fand and Pinsker [14], that states that we may indeed achieve the 
rate /(U; Y; H) — I(XJ; S) assumes that the random variables involved are discrete-valued. In Appendix |IV| we 
use quantization arguments to prove that F(qG,Qc)^ defined using (fTTT) (which assumes continuous random 
variables), is indeed achievable. 

IV. The Linear-Assignment Fading-Paper (LAFP) Achievable Region 

A. Definition 

In Sec. III-Dl we described how dirty-paper transmission methods can be used to construct an algorithm for 
transmission over the non-fading MIMO-BC channel. The same approach can be used to construct an algorithm 
for transmission over the fading MIMO-BC channel, using the linear-assignment fading-paper transmission 
methods of Sec. |III] 

In our approach, we rely on Theorem Q] and confine our attention to Gaussian distributions for the signals 
{Xi}L =1 , defined as in Sec. III-Dl Our choice is greedy in the sense that we seek to maximize the rate to 
each user individually, while a global perspective could possibly prescribe a different choice. However, a 
similar choice in the definition of the dirty-paper achievable region was eventually proven to coincide with 
the global optimum as well. We refer to the convex-hull of the union of rate regions that are achievable using 
this approach, as the linear-assignment fading-paper (LAFP) achievable region. 

The analysis of Weingarten et al. [30] does not apply to the fading setting. Furthermore, linear-assignments 
have not been proven to exhaust the capacity of the fading-paper channel. Thus, unlike the dirty-paper 
achievable region of Sec. III-Dl the LAFP achievable region is not guaranteed to be optimal. 

The determination of the dirty paper achievable region of Sec. III-Dl involves determining the covariance 
matrices S^- for the various signals X; (see e.g. [6] and [29]). However, each signal X; is assumed to be 
independent of the interference S; = J2i<i Xi> ar >d Gaussian. In our above definition of the LAFP, we have not 
restricted ourselves to signals {X;}^ 1 that are independent of their respective interferences {S;}^ =1 . Thus, in 
addition to determining S®, it would appear that we must determine the covariance o, between X^ and 
Si as well. 

However, the following theorem proves that we may indeed confine ourselves to ' s = 0, without loss 
of optimality. 

Theorem 2: The LAFP achievable region is exhausted by a choice of random variables {X^}/^ for the 
various users that are independent of their respective interferences jS/}/^ 
The proof of this theorem is provided in Appendix [V] 

Note that in this theorem we do not claim that for the given fading-paper problem observed by user 
I, selecting X; to be independent of S; incurs no loss of optimality. Rather, the proof involves replacing an 
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entire given set of signals Xi, X/,, which may not be independent (corresponding to some set of achievable 
rates on the LAFP achievable region) with a new set Xi, ...,Xl that are independent, without sacrificing the 
rates of the individual users. In the resulting set, user Z's signal X; is indeed independent of Si = J2i<i Xt- 
However, the independence was achieved also by altering the fading-paper problem this user faces. 

B. Comparison with Dirty-Paper Transmission 

So far, we have focused on similarities between the dirty-paper transmission over a fixed MIMO-BC and 
LAFP transmission over a fading MIMO-BC channel. Both approaches use linear strategies, both employ 
independently distributed Gaussian random variables to construct their signals to the receivers. 

However, the two methods differ in two important ways. 

1) The choice of the constant matrix F in dirty -paper transmission is based on the fixed channel matrix 
H. With fading-paper, only the statistics of H are known and thus F must be selected differently. 

2) The fading-paper receiver accounts for a channel fade H that fluctuates from one time instance to 
another. The dirty-paper receiver assumes that H is fixed. More precisely, the dirty paper decoder seeks 
a codeword that is jointly typical with y, while the fading paper decoder seeks a codeword that is jointly 
typical with both y and H. 

Despite these two shortcomings, dirty-paper transmission can still be applied to a fading-paper channel by 
simply assuming that H is fixed at its average, and treating its fluctuations as noise. For a fading paper 
transmission strategy to be interesting, we must demonstrate that its performance surpasses that of dirty-paper 
transmission. 

An evaluation of the dirty -paper achievable region (i.e., when the transmitter and receiver assume that the 
channel is fixed at its expected value EH) over the fading MIMO-BC scheme is difficult. This is because of 
the operation of the decoder, which uses a mismatched model of the channel. However, we may obtain an 
outer bound on the dirty-paper achievable region if we replace the receiver with an optimal LAFP receiver 
that uses the channel information available to it (unlike the standard dirty-paper receiver). In this case, the 
achievable rate may be obtained from 05l ). With the dirty-paper achievable region, however, the matrices F 
(for each instance of T,x, and for the user) are not the optimal fading paper matrices, but rather are 
computed using (f5]), under the assumption of a fixed channel matrix, equal to EH. Under these conditions, 
the approach differs from LAFP only in the way the matrix F is selected. 

We let Fdpc(H.) denote the choice of F with dirty-paper transmission over a channel whose fixed channel 
matrix is H. That is, Frjpc(H) is a matrix function of H, given by the right hand side of (f5]) (for brevity 
of notation, we neglect the reliance of Fp)pc(-) on Sx and S^). With this notation, the choice of F that is 
used in the above-mentioned dirty paper like transmission strategy is ¥ p>pc(EH). 

Evaluating the LAFP region involves determining the union of the regions obtained for all matrices F. 
Equivalently, it involves maximizing (T35T ) over F (e.g. using a grid search) given the covariances of X and S 
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(note that by Theorem |2] we set Es,X = 0)- However, we obtained an inner bound by restricting our attention, 
for each Ex and T>z to the set 

Fdpc(H) : H G TZh } (36) 




Fig. 1. Comparison between an inner bound on the LAFP achievable region and an outer bound on the dirty paper achievable region. 

Fig- CD presents a numerical example where the two approaches are compared. In this example, there are 

two users (receivers). The transmitter has two antennas (M = 2) and the receivers have one antenna each 

(JVi = N2 = 1). The power constraint is Ptot = 10. The distributions of the channel matrices are given by, 

f [1,0.4] with probability 1/2 f [0.4, 1] with probability 1/2 

H {1) = I ' H {2) = I 

{ [1,3] with probability 1/2 { [3,1] with probability 1/2 

The noise variance at each receiver is 1. 

The achievable regions in both cases (i.e. LAFP and dirty-paper) were found by first applying a grid search 
for the matrices and E^ . In line with Theorem [2l we assumed without loss of optimality that the two 
sis nals XW and X( 2 ) are independent. 

For each such pair E^ and , the matrix F for user 2 was computed as described above. That is, for 
the LAFP achievable region, F was found by maximizing the achievable rate of user 2 over the set T (which 
is a function of the user's covariance matrix^ Ey). For the dirty-paper achievable region, Fdpc(EH) was 
used. 

With both schemes, for fixed matrices E^ , E^ and F, the achievable rates R\ and R2 for the two users 
were computed as follows. R\ was obtained using the following expression (recall that user l's observed 

9 In the general case, where there are more than two users, T is also a function of ? Sy , the unknown interference from 
subsequent users, which must be accounted for in the effective noise as explained in Appendix I VII 
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signal is scalar in this example), 



( HW^HW 
E Hm log 1 + 



i?2 is given by the right hand side of (I35T ). Since we have assumed and X^ 2 ) to be independent, the 
expressions for Ems an d E;y \y.h(H) (which appear in d35l )) are simpk^. That is, \ s = anc ^ 
T,jj | y ^(H) is obtained from (1241 by setting to zero. 

The maximal sum-rate on the dirty paper outer bound was 2.7 bits per channel use, while the maximum 
sum-rate on the LAFP inner bound was 2.86. This achievable rate was obtained by selecting, 



,(i) 
J x 



1 2 - e 
2 - e 4 



E 



(2) 
X 



4.5 
-1.5 + e 



-1.5 + e 
0.5 



1.0909 0.3636 
-0.3636 -0.1212 



where < e — > such that E^ and Ej^ are positive definite. Thus, a simple approach, which uses knowledge 
of the channel distribution at the transmitter, was able to produce at least a 6% increase in throughput. 

Although we have not established the optimality of the LAFP achievable region, we can obtain an idea 
of how far we are from the optimum using a cooperative upper bound on the achievable sum-capacity (i.e., 
the maximum achievable sum rate to all users), as suggested by Sato [23]. The use of such a bound in the 
context of the (non-fading) MIMO-BC channel was first suggested by Caire and Shamai [6]. Computation of 
cooperative upper-bounds for the above fading MIMO-BC example is discussed in Appendix I VIII We obtained 
a bound of 3.17 on the maximum achievable sum-rate. Thus, in terms of the sum-rate, LAFP is capable of 
transmission at rates that are 10% below the optimum. 

In Appendix W\\ we will discuss the computation of the LAFP achievable region with more than two users. 



V. Conclusion 

A. Suggestions for Further Research 

1) Heuristic methods for computing F. Expression (l36l ). with which we computed the matrix F for the 
LAFP region in Sec. IIV-B1 was developed heuristically. A different expression could possibly produce a 
substantially larger achievable region. One option would be to search for F along a fine grid (as noted 
in Sec. IIV-Bt . An alternative option would be to apply a gradient ascent method, using F as defined 
in (l36l ) as a starting point. 

2) A wider range of strategies. The confinement to linear assignments as defined in Sec. [Ill] is in no 

way known to be optimal. Dupuis et al. [11] suggested an algorithm that is based on the concepts 
of the Blahut-Arimoto algorithm, that can theoretically be used to evaluate the capacity of a general 
side-information channel (of which the fading paper channel is an instance). In practice, applying the 
algorithm requires evaluations over a set of strategies which is impossibly large. However, applying 

l0 In the context of our discussion, S = X' 1 ', X = X 1 - 2 ' and Z has covariance T,z = 1. U = FS + X, as usual. 
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the algorithm over any subset of these strategies produces an achievable rate. This achievable rate may 
further narrow the gap to the cooperative upper-bound (as discussed in Sec. IIV-Bb . 

B. Concluding Remarks 

The problem of transmitting over fading MIMO-BC channels is of great practical interest. In this paper we 
presented an achievable region for this channel that relies on fading-paper transmission strategies. Our main 
contribution is Theorem [T] which proves that a Gaussian distribution achieves the linear-assignment capacity. 
We believe that the approach we developed in the proof of that theorem, which employs convex-analysis 
methods, could be useful in further analysis of this channel. 

In Sec. lIV-Bl we have shown that a simple approach, which makes use of the channel distribution information 
available to the transmitter, easily produces a gain over dirty-paper transmission. Further research (perhaps in 
the lines of Sec. IV-Ab could produce further performance gains. 

Appendix I 

The Optimal Achievable Rate with Zero Channel State Information at the Transmitter 

Consider a broadcast channel where all the receivers have the same number of antennas. We wish to show 
that capacity in this case is achieved by time-sharing among the users. 

A channel model that assumes zero knowledge of the channel fade to each of the users, effectively assumes 
that all channels are the same. The signals at the different receivers are equivalent in their statistical properties, 
and thus each receiver is capable, beside decoding its own signal, of decoding all the messages to the other 
users as well. Thus, the sum-rate of this system is upper-bounded by the single-user rate of each of the users. 
Such a capacity region is exhausted by time-sharing. 

Appendix II 

The Optimal Matrix F in the Achievability Proof for Dirty-Paper 

In this appendix we prove the optimality of F as defined by ©. We let U and X be defined as in the 
discussion preceding ©. The achievable rate with this choice is given by I(U; Y) — J(U, S) (see (01)). We 
now seek to prove that this rate coincides with the capacity of the corresponding no-interference channel 
defined by ([3]). Our proof follows in the lines of a similar proof by Cohen and Lapidoth [6] for the scalar 
dirty-paper channel. 

To obtain our result, we prove a stronger result. We prove that for any choice of Ex, letting F be given 
by ©, we obtain that the achievable rate coincides with the achievable rate /(X; Y) for the no-interference 
channel ©. 

Our objective is to show that the achievable rate /(U; Y) — I(TJ, S), with this choice of F, coincides with 
the achievable rate of the no-interference channel when the input X is distributed as 7V(0, £x)- 



TO BE PUBLISHED, IEEE TRANSACTIONS ON INFORMATION THEORY 19 

Let X = WY be the linear minimum mean-square error (LMMSE) estimate for X given Y. W is obtained 
by [19], 

W = Cov(X, Y)Cov(Y)- 1 = S X H T (HS X H T + S^)" 1 (37) 

By definition of the LMMSE estimate, the error E = X — X is uncorrelated with Y. Since E and Y are jointly- 
Gaussian, they are also independent. S is independent of both, and thus E is independent of Y = Y + HS. 
Examining J(U; Y) — J(U, S), we have 

J(U;Y)-J(U,S) = /t(U|S)-/j(U|Y) (38) 

We now examine both elements of the difference on the right hand side of the above. 

h(V | S) = h(¥S + X | S) = h(X) (39) 

where the last equation is obtained by the fact that S and X are independent. 

h(XJ\Y) = h(FS + X | Y) = /i(WHS + X | Y) 

= /i(WHS + X - WY | Y) = /i(WHS + X - W(HS + HX + Z) | Y) 

= h(X - W(HX + Z) | Y) = h(X - X | Y) = h(E | Y) = h(E) = h{E | Y) 

= h(X - WY [ Y) = h(X | Y) (40) 

Equality (a) is obtained from the observation that the right hand side of ® equals W • H where W is given 
by (137T ). Equalities (b) and (c) are obtained from the fact that E is independent of Y and Y. Finally, combining 
(l38l) . (l39l and (l40l we obtain our desired result, 



J(U; Y) - /(U, S) = h(K) ~ hpL \ Y) = /(X; Y) 

□ 

Appendix III 
Proof of Lemma Q] 

Let q and Q be a pair of feasible functions for (fT2l ). We will now show that F(q, Q) < F(q* , Q*). 

F(q, Q) - F{q\ Q*) = ( j ( ( /(s)/(y, H | s,x = f (u, s)) ■ 

[ Q(ujy,H) Q*(u|y,H) 1 

• o(u s) log ; — — o (u s) log ; — j — - — cfH ay du ds (41) 

g(u | s) q*(u | s) 

Let l(x,y) = x ■ \og{y/x). This function is jointly-concave in its arguments. By the gradient inequal- 
ity [4][Chapter 3, Section 3.1.3] for concave functions, we have for arbitrary x,y G M + and x*,y* G 

l(x, y) - l{x\ y*) < l x {x\ y*) ■ (x - x*) + l y (x*, y*) ■ (y - y*) 



TO BE PUBLISHED, IEEE TRANSACTIONS ON INFORMATION THEORY 



20 



where l x and l y denote the partial derivatives of I with respect to x and y, respectively. Thus, we can bound (f41l) 

by, 

F M)-n<f,Q*) < l M J rf J yeM J He7 ,/(^H|s,x^(u,s)). 

• [l x (q*(u | s), Q*(u | y,H)) • (g(u | s) - q*(u | s)) + 

+ l y (q*(u | s), Q*(u | y, H)) • (Q(u | y, H) - Q*(u | y, H))] dU dy du ds (42) 

In the development below, we will show that this integral equals zero. This will then conclude the proof of 
the lemma. 

To prove this, we will show that the two integrals below equal zero. For simplicity of notation, we let q 
and Q denote g(u | s) and Q(u | y, H), respectively. 

/ to m I 1^1 f(s)f(y,H\ S ,x = f(u,s))-l x (q*,Q*)-(q-q*)dHdydud S = (43) 
I TOM / mM I TOJV / /(s)/(y,H|s,x = f(u,s))-^(^,Q^)-(Q-Q' t ) ( iHdyduds = (44) 
We first prove (l43l . Multiplying (TTTT ) by q — q* , and using the fact that l x {x,y) = \og{y/x) — 1, we get 



l N I f(s)f(y,H\s,x = f(u,s))l x (q*,Q*)dUdy 
JyeR Jnev.„ 



(q - q*) + 



/(s) < T, f (u, s) • f (u, s) T >j (q - q*) + [/(s) < T, s • f (u, s) T >j (q - q*) + a(s)(q -q*) = Q 

Vs G M M ,Vu G M M 

Integrating the above with respect to u and s would yield zero. We now focus on the integrals of the individual 
elements of the above sum. The first integral is equal to the left hand side of (l43l . To prove this integral is 
zero, we will show that the other integrals are zero. This will yield d43l . 

We first integrate with respect to u and then s. The order of integration matters, because the range of 
the integration is unbounded, and some of the integrands are not non-negative and not necessarily Lebesgue- 
integrable (i.e., the integral of their absolute value may be infinite). 



/(s) < T, f (u, s) • f (u, s) T > (q- q*) du ds 



sgK Ju€ 

< T 



ueJi 



f fu, s) • f (u, s 



q du ds 



/(s) f(u,s)-f(u,s) T .q*duds> 

:<T,S X -S X >=0 



The equality before last results from (1131 ) and from the feasibility of the functions q and q*. In a similar way, 
using (fT4l . we obtain that, 



ueJi 



f(s) < r, s • f (u, s) T > (q- q*) duds = 
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Finally, we examine the last integral. 



/ / a(s)(q — a*) du ds = / a(s) / q du — q* du 

J s eR M JueR M WW ' JseR M UueR JueR M 

= [ a(s) [1-11 ds = 



ds 



The equality before last results from ( fl5l ). Thus, we obtain d43l . 
Similarly, relying on (TT8l) and (fl6l) . we obtain, 



[ mN f I TOM / , lDM /(s)/(y,H|s,x = f(u,s)).Z I ,(g*,Q*).(Q-Q*)dsdudHtfy = (45) 



The order of integration, unfortunately, is not that of (144b . To prove that we may change the order of integration, 
we must prove that the integrand is Lebesgue-integrable (Fubini's Theorem, see e.g. [2] [Theorem 18.3]). To 
do this, we will prove that 

[ [ [ [ f( S )f(y i n\s,x = f(u,s))-l y (q\Q*)-Qd S dudH.dy<oc (46) 

Since the integrand in the above is nonnegative, this would yield that it is integrable. Since Q is arbitrary, the 
same would apply if we replace it with Q*. The integrand in (1431 ). which is not necessarily nonnegative, is 
thus also integrable because it is obtained by subtracting the integrand in (l46l) by the same expression, with 
Q replaced by Q*. 

Using l y (x,y) = x/y, we may rewrite the left hand side of (l46l) as 



[,[[,[ /(s)/(y.H|s,x = f(u,s)) -J- QdsdudHdy 
J y& R N JHen H JueR M JseR M ( ' V ' Q* * 



lHen H JueR M JsgR m Q 

I I I Tr-IY /(s)/(y,H|s,x = f(u,s))-g*cfe 
J y& R N Jnen H JueR M Q* [J s eR M K ' K 1 V ' )} H 



du tiH dy (47) 



The inside of the brackets is equal to <j*(y,H,u), defined to equal the marginal density of Y, H and U 
where the distribution of U given S is determined by the density q*. Similarly defining q*(y, H), we obtain 
by the conditions of Lemma [Q that <j*(y,H,u) = q*(y, H) • Q*(u \ y, H). Thus, (|47T ) becomes, 

III -^-V(y,H) -Q+dudHdy = / / g*(y,H) / Q du dH dy 
JyeR N JHen H JueR M Q* J y eR N Jnen H V ' JueR M 

= [ [ q*(y,n)-ldHdy = l<oc 

Thus, by the above discussion, the order of integration in (I43T ) can be changed, and we obtain (l44l . Coupled 
with (1431 . this proves that the right hand side of (l42l is zero, concluding the proof of the lemma. □ 



Appendix IV 
The Achievability of F(qc, Qg) 

The random variables U, S, Y, H that achieve the LAFP capacity are continuous. In practice one can only 
realize the Gelfand-Pinsker capacity of a set U, S,Y,H of discrete random variables. We now show that 
U,S,Y,if can be quantized to a set U, S,Y,H of discrete random variables that can approach the LAFP 
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capacity arbitrarily close. The LAFP capacity is given by ^achievable = F(qg,Qg) where F(q,Q) is defined 
by CU). 

We create a quantized version as follows. Let B n {c, d) denote a cube in R n with center c and size length 
d, i.e., 

B n {c, d) = {(xi,x 2 , • • • , x n ) : c - d/2 < X{ < c + d/2, i = 1, . . . , n} 

We define discrete random variables S,\J,Y,H which are quantized versions of S,XJ,Y,H, respectively, 
as follows. Recall that M and N are the dimensions of S and Y, respectively. The dimension of H is thus 
M x N. Fix some e > sufficiently small, and p > sufficiently large. Let Sj, i = 1,...,N S denote all 
the points in M. M , such that Sj € Bm(0, p) and such that all the coordinates of Sj are integer multiples of e. 
Similarly, let u,, j = 1, . . . , N u , y fc , k = 1, . . . , N y and Hi, I = 1, . . . , N h denote all the points in R M , R N 
and TZh, such that Uj € £>m(0, p), yu £ Bn(0,p) and Hz G Bmn{0,p), and such that all the coordinates of 
uj, yfc and H; are integer multiples of e. 

We define by Si, i = 0, 1, . . . , N s the following regions, 

R M n^A/(si,e), if i = 1,2,..., N 8 ; 

R M \[U£aB M (si,e)l, ifi = 0. 



Similarly we define 



l M \ U&B*r(u,,e) 



if J = 1,2, 
if j = 0. 



,N U ; 



and 



34 



Hi 



^Ar(yfc,e), if A; = 1,2, ... ,N y ; 

^ N \\Jkli^N(y k ,e), ifk = o. 



n H r\B M N(iii,e), if/ = 1,2, 

TlHWj^BMN^ue), if Z = 0. 



The quantized random variable S is defined as follows: S = i if S E 5j. The quantized random variables U, 
Y and H are defined similarly. The joint probability of S,U, Y,H is, 

p (s = i, U = j, Y = k, H = V] = 

/ / / / /(s)/(y,H [ s,x = (s,u))g G (u | s) dH dy duds 

Jse5i JugWj JyGy fc JHeHi 

The Gelfand-Pinsker achievable rate corresponding to the quantized random variables is, 

p(fj = j | Y = k, H = I) 



R = Y P (S = i,\J = j,Y = k, H = l) log 



(48) 



We claim that R = R, 



achievable 



+ o ejP (l) where o £jP (l) is a term that approaches as e — » and p — > oo. 
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To see this, first note that when s G So or u G Uq or y G y$ or H G Ho, the contribution to F(qa, Qg) 
in ([TT1) approaches as p — > oo. In addition, log ^^"J is uniformly continuous in the region s G <So, 
u G Uo, y G H G Ho. Hence, 

Ffoe, Qg) = £ P (S = <, U = j, Y = k, H = l) log Qoinjly^) + 

In addition, by the uniform continuity of the Gaussian distribution in the region s G <So, u G Uo, y G 3tjj 
H G Ho, 

P(U = j \ Y = k, H = I) 
Qc(uj | yfc 3 Hj) 

and 

P(U = j | s = i) 



1 + o £ ,p(l) 



9G(Uj I Si) 



1 + M 1 ) 



Finally by arguments similar to those indicated above, the contribution of terms with i = or j = or 
fc = or I = in (|48T ) is negligible. 

Hence we obtained the desired claim that R = ii a chievabie + o e%p {\). 



Appendix V 
Proof of Theorem [2] 

Our approach is the following. We begin with an assignment of variables for the LAFP achievable region. 
This means a set of variables Xi,...,Xl that are not necessarily independent. A set of matrices Fi,...,Fl 
and a set of auxiliary random variables U; = F;S; + X; where S/ = Ej^Xj. Recall that in our current 
context, X = Xi + ... + Xj, denotes the transmitted symbol of the MIMO-BC channel, while X; denotes the 
transmitted signal to user I, equivalent to X as in Sec. IIII-BI 

We will construct an alternative set of independent random variables Xi, ...~Kl and Fi, ...,Ft such that 
the transmitted signal X = Xi + ... + Xi = Xi + ... + X^ = X. Thus, the distribution of the actual 
transmitted signal is unchanged and satisfies the power constraint. Furthermore, we show that for similarly 
defined = F/S; + X; and S; = Ej^Xj, the achievable rates satisfy Ri > R[, where 

Ri= /(Cr,;Y,,H,) - J(U,;Sj), Ri = /(Uj;Yj,Hj) - /(U,;S,) 



A. Definition of Xi , . . . , X^ 

For each I = 1, L, using Gram-Schmidt orthogonalization, X; can be written as X/ = T/S/ + Xj where 
Yi is a matrix and where Si and Xj are uncorrected. Therefore, since we have assumed, in our definition 
of the LAFP region in Sec. IIV-A1 that all variables are jointly Gaussian, they are independent. With this 
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definition, 

x = s L + x L = (i + r L )s L + x' L 

= (i + r L ) [s L _i + Xr_i] + x' L = (i + r L ) [(i + r z _i)s x _i + x' L ^] + x^ 

= (i + r L ) ■ ... • (i + r 2 )x; + (i + r L ) • ... • (i + r 3 )x' 2 + ... + (i + r L )x / L _ 1 + x^ 

We thus define X; = G/XJ where Gj = (I + T L ) ■ ... • (I + I = 1, L - 1, G L = I. By construction, 

Ef=i X = X = Ef=i X;, as desired. 

The following lemma summarized some properties of our random variables. 

Lemma 2: For all I = 1, L, 

1) Xz is independent of Xi, ...,Xj_i. 

2) X; is independent of Si, S/. 

3) X; is independent of Xi, ...,X;_i. 

4) S, = G W S, 

Proof: To prove property [T] observe that the following Markov relations hold: Xi,...,Xj_i < — ► S; < — > 
Si,X-i < — ► X/. X^, by construction, is independent of Si. It is thus straightforward to verify, using this Markov 
relation, that it is also independent of Xx, ...,Xj_i. To obtain properties [2] and |3l observe that Si, ...,Sj and 
Xi, ...,Xj_i are functions of Xi, ...,X/_i and thus are independent of Xj. 
The last property is easily obtained by induction. For I = 1, 

s 1 = ^x i = o = ^x = s 1 

i<l i<l 

The rest is obtained by the following induction: 

S m = §, + Xi = G,_iS, + G,Xj = G,(/ + r I )S, + G,Xj = G,[(/ + r I )S, + xg 
= Gi [s t + r,s, + X'J = Gi [S t + X] = G,S J+1 

□ 

B. Definition of F i , . . . , Fl 

We have not yet defined P;. To do so, we first consider G/ ■ Uj. By the definition of U; 

G, • U, = G,[F,S, + X,] = G^Fi + r t )Si + Xfl = G/(F ; + r,)S z + X (49) 

where the last inequality was obtained by the definition of X/, above. Using Gram-Schmidt orthogonalization, 
Si can be written as Si = B/ • S/ + where B; is a matrix and D/ is uncorrelated with S/. Since the variables 
are jointly Gaussian, D/ is also independent of S/. We proceed 

Gi-Vi = G,(Fi + r,)tB,S, + D,]+X, 

= G,(F, + r,)B,§, + X, + G,(F, + r,)D, (50) 
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C. Proof of Ri > Ri 

Recall that = F/S/ + Xj. To prove Ri > Ri, we first define an intermediate auxiliary variable U; 
Gj • U/. Since U; is a function of U/, we have 

Ri = JCUj.UjsY^iro-JCUi.UuS,) 

= tf(U^U,|SO-ff(U,,U,|Y,,£T,) 

= F(uHs I )-ir(uHYt J ir,) + fr(uHUi,s,)-fl'(u { |u { ,Y I ,fr,) 

= 'fT(U,|Si)-ir(Uj|Yi,ir,)] + [l(U I ;U,,Y,,ir { )-/(U { ;Ui ) S, 

We now wish to show that the contents of the second brackets are non-positive. For this purpose, we will show 
that the following Markov relations hold: U; < — ► U;,S/ < — > Uz,X;,S/ < — > Uj,X < — > \Ji,Yi,Hi. The 
desired result will then follow from the first and last Markov relations, using the data processing inequality. 

The second relation (first Markov triple) follows from the fact that X/ and S/ may be determined from U; 
and Si by means of deterministic functions: X/ through d49l ), and Si, by Lemma [2] satisfies Sj = G;_iS;. For 
the third relation, observe that X = S/ + X; + J2i>i By the above definition all {Xj},>;, are independent 
of Uj, Uj,Sj, Sj and Xj. Therefore this Markov relation holds. The last Markov relation is straightforward. 

We thus have, 

Ri < H(Ui | Sj) - H(Ui \ Y h Hi) (51) 

Examining the first element of the above difference, we obtain: 

H(XJi | Si) = (G,(F, + r,)S, + ±i I Si) = H(±i | S,) = H(±i) = H(% | S { ) = H(FiSi + X I Si) 
= ^(UiISi) (52) 

where the first equality follows from the definition of and from ( |49b . The third equality follows from the 
independence of X; and Si and the fourth from the independence of X/ and S/. 
Examining the second element of (IBTT) . we have 



H(Ui\Yi,Hi) = ff(Ui + Gi(Fi + ri)Di | Y l ,H l ) > ff(U, + G,(Fj + r,)D, | Y { ,fl,,D { ) 

= H(Ui | Yi, Hi, Di) = i?(Ui | Y h Hi) (53) 



The first equality follows from (1501 ) and the definitions of U/, U/ and Fj. The inequality results from the fact 
that conditioning cannot increase the entropy. To prove the last equality, we wish to show that and are 
independent, given Y; and Hi. 

JJi is a function of Si and X/. Therefore, it suffices to show that D; is independent of these two random 
variables, given Y/ and Hi. Di is independent of S; by construction. In addition, X; is independent of 
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Si, Si and D^, because is a function of Si and of Sj, where Si = G;_iS; (by Lemma [2]), and X; 
is independent of Si (again, by Lemma |2). Therefore, D; is independent of Si and X/. To show that the 
independence is maintained even when we condition by Yi and Hi, we prove the following Markov chain 
relation D/ < — > Si, X; < — ► S/, X;, ...Xl < — > X < — > Hi,Yi. The second relation (first Markov triple) holds 
because the random variables X; +1 , ...,Xx are independent of Si and of X; by Lemma [2j and of D^, by 
virtue of it being a function of Si and Sj. The third relation holds because X = S/ + X/ + ... + Xj,. The 
fourth relation holds because Y; = i^X + Zi and Hi and Z; are independent of the other random variables 
D/, Si,X.i, ...X/,, X. 

Combining 011 ). ( f52T > and d53l ) we obtain, 

Ri < H(tfi | §,) - H(Vi | Y h H) = I(Ui;Yi,Ht) - /(U,;S,) = i?z 

This completes the proof. □ 

Appendix VI 

Computing the LAFP Achievable Region when the Number of Users is Greater than Two 

In Sec. IIV-BI we considered the computation of the LAFP achievable region over a fading MIMO-BC 
channel where the number of users is two. In this appendix we briefly consider the case of more than two 
users. To obtain the LAFP achievable region, we could again (as in Sec. IIV-BI ) apply a grid search to obtain 
{^x}b=v A straightforward approach would be to compute, for each choice of such matrices, the achievable 
rates for each of the individual users by selecting the matrices F, for each user (except for the first who does 
not have an associated F matrix) so as to maximize 051 ). However, the computational complexity of such an 
approach would grow exponentially with the number of users. 

The following observation can be used to reduce the number of computations. The achievable rate for 
user I is a function of Ey (the covariance matrix of its transmitted signal X;), of Sy = J2i<i ( tne 
covariance matrix of the interference S; = J2i<i Xj) and E^ = J2i>i Ht!^H t + I (the covariance matrix 
of the effective noise Z; = H J2i>i ^-i + Z). Thus, the achievable rate for user I needs to be computed only 
once for each of the possible choices of Eg , Ejjr and E^ , and not for each choice of {E^}^ =1 . A dynamic- 
programming algorithm that relies on this observation can dramatically reduce the number of computations. 
This approach is useful when the number of transmit antennas and the number of receive antennas of each user 
is small (the number of users can be large). Otherwise we can resort to suboptimal methods for computing 
the transmit covariances {^x}f=l ^ an< ^ tne ^ matrices), e.g. using gradient descent or alternate maximization 
that maximizes the sum rate with respect to two E^-s at a time, while fixing the other Ey-s. 

Appendix VII 

Computing a Cooperative Upper-Bound in our Setting 

Sato's upper bound [23] on the sum rate capacity (the maximum achievable sum-rate) of a broadcast channel 
relies on two observations: 
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1) A fundamental assumption in the broadcast channel model is that the users are not able to cooperate in 
their decoding. Consider a virtual channel where the users are allowed to cooperate. The sum capacity in 
this channel is clearly an upper bound on the sum rate capacity of the true channel. Such a cooperative 
model is equivalent to transmission to a single virtual user, to whom all the outputs of the broadcast 
channel users are made available. 

2) The capacity region of a broadcast channel depends not on the joint distribution 
Pr(Yi,#i,..., Y L ,H L | X) but on the marginal distributions Pr(Yi,fli | X), Pr(Y L , H L | X) 
alone. Thus, we may alter our model by introducing correlation between the noise signals and channel 
matrices of different users. As long as the marginal statistics of the individual channels to each of the 
users stay the same, the resulting broadcast channel's capacity region will remain unchanged. However, 
introducing correlations could alter (and tighten) the above-mentioned cooperative upper bound. 

Note that with any valid choice of correlation that we choose to introduce, the maximum cooperative sum-rate 
produces an upper bound on the broadcast channel's sum-rate capacity. We refer to such an upper bound as 
a cooperative upper bound. The Sato upper bound is the tightest such bound. 

Consider the channel to the virtual single user corresponding to the fading MIMO-BC example of Sec. HV-Bl 
This user will observe a virtual channel matrix and a virtual noise defined as, 



H 



ff(2) 



and Z 



Z( 2 ) 



Our above discussion implies that we may freely introduce correlations as long as we do not alter the statistics 
of the channel observed by each of the individual users. We may thus introduce a correlation between the 
two noise signals Z^ 1 ' and Z( 2 \ following the examples of [6] and [29]. We may also introduce correlation 
between the two channel matrices and H^ 2 \ Furthermore, we may introduce correlation between the 
channel matrix of one user and the noise of the other. 
The possible values for H are, 





l 


0.4 




1 0.4 




1 3 




1 3 


1 


{- 


0.4 


1 


,H 2 = 


3 1 


,H 3 = 


0.4 1 


, H4 — 


3 1 





Let p(H) denote the probability assignment to each of the above matrices. To preserve the marginal statistics 
of the channel to each of the individual users, we require that p(H) satisfy the following constraints, 

1 



p(Hi) +p(H 2 ) 



p(U 1 )+p(U 3 ) = - 
Furthermore, for p(H) to be a valid probability assignment, it must satisfy, 

p(Hi) +p(H 2 ) +p(H 3 ) +p(H 4 ) = 1 
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Cov(Z | H = Hi) 



1,...,4 



The constraints imply that p{H) is completely described by a=p(H.{). That is, for any a S [0, 1/2], we have 

p(Hi) =p(H 4 ) = a, p(H 2 ) = p(H 3 ) = - - a 

One way to introduce correlation between the various noise elements is to follow the approach of [6]. That 
is, introduce a correlation coefficient p € (—1, 1) and consider a virtual noise Z whose covariance matrix is, 

1 P 

Cov(Z) = 

. P 1 

However, a more general approach would introduce correlation between the virtual noise and the above virtual 
channel matrix in the following way: We will consider four correlation coefficients pi, P2, P3, Pi such that, 

1 Pi 
Pi 1 

The channel noise observed by each of the users remains distributed as 7V(0, 1). Furthermore, each of the 
individual realizations of Z^> and remains independent of the respective channel matrices and H^ 2 \ 
Thus, the marginal statistics of the channels to each of the individual users remain unchanged, as desired. 
The capacity of the channel to the virtual user is now obtain by taking the maximum of, 

4 

I(X; Y, H) = /(X; H) + /(X; Y | H ) = /(X; Y | H) = £ p(H)/(X; Y | H = H<) 

i=l 

The first equality is obtained by the chain rule for mutual information, and the second by the independence 
of X and H. The distribution that maximizes the above is clearly Gaussian. Thus, 

4 H 

C = max^XHO- logdet (i + A^H^Hf A^ 1/2 ) (54) 
Sx i=i 1 

where A; = Cov(Z | H = H). 

We may now numerically obtain a cooperative upper bound in the following way. We consider all choices of 
a, pi, p4 along a fine grid. For each such choice, we evaluate (l54l) by applying semidefinite programming 
to determine the Sx that achieves the maximum. Each choice of a, pi, p^ produces a cooperative bound. 
We conclude by selecting the lowest (tightest) bouncO 

In our numerical results (as presented in Sec. HV-Bb . the tightest bound was obtained by setting a = and 
Pi = P2 = P3 = Pi = 0.3. Thus, the tightest bound was obtained with a limited exploitation of the available 
degrees of freedom in the above approach. 
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"Note that the bound obtained in this way is not necessarily the true Sato upper bound (i.e., the tightest possible cooperative 
bound), because we have not proven that our approach exhausts all the possible ways of introducing valid correlations between the 
various signals. 
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